Entry Name:  "TJU-Wang-MC1"

VAST Challenge 2015
Mini-Challenge 1

 

 

Team Members:

Wuquan Wang, Tianjin University, China, wwq0806@tju.edu.cn   PRIMARY

Jie Xu, Tianjin University, China, xujie_nm@tju.edu.cn    

Dongni Hu, Tianjin University, China, dongnihu@tju.edu.cn

Panjiao Yan, Tianjin University, China, yanpanjiao@tju.edu.cn

Zhenbao Fan, Tianjin University, China, fanzhenbao@tju.edu.cn

Jie Li, Tianjin University, China, vassilee@tju.edu.cn   SUPERVISOR

Kang Zhang, Tianjin University, China, kzhang@tju.edu.cn  SUPERVISOR

 

Student Team: YES

 

Did you use data from both mini-challenges?

NO

 

Analytic Tools Used:

D3

MYSQL

C++

Excel

SPSS

 

Approximately how many hours were spent working on this submission in total?

About 60 hours (60 days and 1 hours per day)

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete?

YES

 

 

Video:

index.files\TJU-Wang-MC1-Demo.wmv

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

MC1.1 – Characterize the attendance at DinoFun World on this weekend. Describe up to twelve different types of groups at the park on this weekend. 

 

    1. How big is this type of group?
    2. Where does this type of group like to go in the park?
    3. How common is this type of group?
    4. What are your other observations about this type of group?
    5. What can you infer about this type of group?
    6. If you were to make one improvement to the park to better meet this group’s needs, what would it be?  

Limit your response to no more than 12 images and 1000 words.

 

Before answering the question on group characteristics, we first identify groups of individuals using statistical clustering methods. We put any two individuals into one group if they were together for more than 7th of the entire period in the park. We next allocate all the groups into 12 different types based on the average check-in numbers in each facility, using the K-means clustering algorithm. Finally, we visualize the clustering results to analyze whether our strategy is correct.

 

a.  The size of each type of groups. 

Instead of merging three days’ data, we process the daily data separately. This is because, many people borrowed devices in the park, but not directly installed the app on their own phones. Moreover, if we see an ID on Friday and again on Saturday, it is the same person or are they only unique in-day.

As shown in Figure 1-1, there are 12 different types of groups in 12 different colors. Each circle represents a group and its size represents the number of individuals in the group. The black arrow at a circle generates a popup showing the “typeID”, “typeSize” representing the total number of people within this type, “number of groups” for the total number of groups of this type, “Min” for the smallest group size and “Max” for the largest group size within this type. We find that, in all 12 types of groups, the smallest groups all contain 2 people. We can see the check-in information of different groups in the radar chart.

 

QQ截图20150707162723

     Fig.1-1 12 different types of groups

b. Places where different types of groups like to go.

  Fig. 1-2-1 shows places where different types of groups like to go in the park. X-axis represents 12 classified types of groups. Y-axis represents 6 places in the park. Z-axis represents the average number of visits per person to each place.

  First, we find that most types of groups like to visit Thrill Rides.The average number of visits to Thrill Rides is significantly higher than those to other places.

b

Fig.1-2-1 Distribution of types of groups in all facilities

Second, we look into individual types. Figure 1-2-2 shows the information of the 3rd type on Friday, 8rd type on Saturday, 7rd type on Sunday. In contrast, the 8rd type on Sunday is more interested in the shows.

b-2

      Fig.1-2-2 The 3rd type on Friday, 8rd type on Saturday, 7rd type on Sunday

c. The commonality of each type of groups.

It is necessary to discuss the commonality of each type and difference among different types of groups.

Figure 1-3-1ashows how many people in each type of groups during the three days.

Figure 1-3-1bdemonstrates how many groups in each type.

Figure 1-3-1cshows how many individuals in the largest group of each type. 

c

Fig.1-3-1aGroupSize(b) GroupNum(c) Max

Fig.1-3-2 shows an example: the commonality and differences between the 8rd type (purple) and the 12rd type (pink) on their Friday’s activities.

total-Star-Fri-8-word (1)

Fig. 1-3-2  The 8rd and 12rd type on Friday

d. Other observations on different types of groups.

Using force layoutFigure 1-4, we can observe that most types of groups like to visit all the facilities. A few types of groups like to visit specific facilities.

fother1

fother2

fother3

Fig.1-4 (a) Fri Sat force mapping; (b) Sat Sun force mapping; (c) Sun force mapping

 

e. Inference on the types of groups.

1. Based on the above information, we can infer that the types preferring Thrill Rides but no interested in other facilities should be young people.

2. The type of groups interested in Kiddie Rides may be parents and children.

3. The type only paying attention to the daily shows may be fans of Scott Jones.

f. Suggestions to the park to better meet this type’s needs

In the DinoFun World, the most popular facility is Thrill Rides while the most unpopular one is Kiddie Rides. The number of people visiting Thrill Rides was increasing from Friday to Sunday, while more people visited Kids Rides on Friday than those on Saturday and Sunday.

Given these findings, we suggest to increase the number of Thrill facilities to reduce the queuing time and attract more visitors. Wherever large gathering occurs, it becomes dangerous. We therefore suggest to add more security staff in the popular areas to ensure the safety of the visitors. During the weekend or show time, more security staff should be on duty.

 

MC1.2 – Are there notable differences in the patterns of activity on in the park across the three days?  Please describe the notable difference you see.

 

Limit your response to no more than 3 images and 300 words.

 

1. A significantly larger number of people visited the show in the weekend than on Friday. We infer that since the show was once in the morning and once in the afternoon, the difference in the number of people visiting the show in 3 days is less obvious than the difference with other types of facilities.

MC2

Fig.2-1 Left view during three days

2.The average number of people visiting Kid Rides on Sunday and Saturday are less than that on Friday.

3.We wrote a program to generate a heat map on top of the park map to show the population densities in different parts of the park during three days. Setting the time unit as 1 minute, color for densities ranging from green (sparse) to red (most dense). Figure 2-2 is a snapshot at Sunday 9:30 a.m., showing a few large points, which correspond to the facility entrances. The points are color-coded by their population density, ranging from gray to red. The heat map is animated over time, demonstrating what is happening in each facility.

heatmap

Fig. 2-2 Heat map at Sunday 9:30 a.m.

We find that Saturday is the only day, when a person showed movement inside the park but no check-in at any of the three gates. Fig. 2-3 displays the difference on the three mornings.

initpintu_副本

Fig. 2-3 Difference in the three mornings

 

MC1.3 – What anomalies or unusual patterns do you see? Describe no more than 10 anomalies, and prioritize those unusual patterns that you think are most likely to be relevant to the crime.

 

Limit your response to no more than 10 images and 500 words.

 

1.      Some people did not play any facilities all day long, or played only one or two. We suspect their purpose of entering to the park.

 

周日

Fig.3-1 Fri force mapping

2.      The person 657863 who left on Saturday morning stayed in the park throughout Friday night. We find that he was part of a group that visited Alvarez Beer Garden on Friday. The other group members, however, left the Garden, leaving him behind. We make two assumptions: (1) he and his friends drank beer in the park on Friday evening and he was drunk and left behind in the park; (2) this group related to the crime, and this person stayed to prepare for it.

3.      Having observed the No.63 facility and made a linear regression on the data sets of the three days, we find that the similarity between any two groups is larger than 0.8. The difference appears on Saturday afternoon, when the data is different from others, with the similarity less than 0.5.

Fri 63 Check-in

Fig.3-2 (a) Fri 63 Check-in; (b) Sat 63 Check-in; (c) Sun 63 Check-in

4.      As mentioned before, we find that one person 657863 showed only the movement record without check-in at all.

5.      Another anomaly is that Pavilion No. 32 was closed for a while on Sunday.

32sun

Fig.3-3  Sun 32 Check-in

6.      IDs 103006, 313073, 657863, 1412235, 1937843 of the 11th group and IDs 521750, 644885, 1080969, 1600469, 1629516, 1781070, 1787551, 1935406 of the 550th group, both of Type 11, didn’t play any facilities on Friday. Therefore, Type 11 played significantly less than others.

nothing

Fig.3-4 Force-directed layout for Fri

7.  Similarly , IDs 1392457, 1723510 of the 220th group of Type 11 did not play any facilities on Saturday.

nothing周六

Fig.3-5 Force-directed layout for Saturday

8.  Among the 1077 groups, IDs 521750, 644885, 1080969, 1600469, 1629516, 1781070, 1787551, 1935406, did not play any facilities.

nothin周日

Fig.3-6 Force-directed layout for Sunday